Goto

Collaborating Authors

 nullnullnull null


Responsible LLM Deployment for High-Stake Decisions by Decentralized Technologies and Human-AI Interactions

Sachan, Swati, Miller, Theo, Nguyen, Mai Phuong

arXiv.org Artificial Intelligence

High-stakes decision domains are increasingly exploring the potential of Large Language Models (LLMs) for complex decision-making tasks. However, LLM deployment in real-world settings presents challenges in data security, evaluation of its capabilities outside controlled environments, and accountability attribution in the event of adversarial decisions. This paper proposes a framework for responsible deployment of LLM-based decision-support systems through active human involvement. It integrates interactive collaboration between human experts and developers through multiple iterations at the pre-deployment stage to assess the uncertain samples and judge the stability of the explanation provided by post-hoc XAI techniques. Local LLM deployment within organizations and decentralized technologies, such as Blockchain and IPFS, are proposed to create immutable records of LLM activities for automated auditing to enhance security and trace back accountability. It was tested on Bert-large-uncased, Mistral, and LLaMA 2 and 3 models to assess the capability to support responsible financial decisions on business lending.



Deep Pathomic Learning Defines Prognostic Subtypes and Molecular Drivers in Colorectal Cancer

Wang, Zisong, Wang, Xuanyu, Chen, Hang, Wang, Haizhou, Chen, Yuxin, Xu, Yihang, Yuan, Yunhe, Luo, Lihuan, Ling, Xitong, Liu, Xiaoping

arXiv.org Artificial Intelligence

Precise prognostic stratification of colorectal cancer (CRC) remains a major clinical challenge due to its high heterogeneity. The conventional TNM staging system is inadequate for personalized medicine. We aimed to develop and validate a novel multiple instance learning model TDAM-CRC using histopathological whole-slide images for accurate prognostic prediction and to uncover its underlying molecular mechanisms. We trained the model on the TCGA discovery cohort (n=581), validated it in an independent external cohort (n=1031), and further we integrated multi-omics data to improve model interpretability and identify novel prognostic biomarkers. The results demonstrated that the TDAM-CRC achieved robust risk stratification in both cohorts. Its predictive performance significantly outperformed the conventional clinical staging system and multiple state-of-the-art models. The TDAM-CRC risk score was confirmed as an independent prognostic factor in multivariable analysis. Multi-omics analysis revealed that the high-risk subtype is closely associated with metabolic reprogramming and an immunosuppressive tumor microenvironment. Through interaction network analysis, we identified and validated Mitochondrial Ribosomal Protein L37 (MRPL37) as a key hub gene linking deep pathomic features to clinical prognosis. We found that high expression of MRPL37, driven by promoter hypomethylation, serves as an independent biomarker of favorable prognosis. Finally, we constructed a nomogram incorporating the TDAM-CRC risk score and clinical factors to provide a precise and interpretable clinical decision-making tool for CRC patients. Our AI-driven pathological model TDAM-CRC provides a robust tool for improved CRC risk stratification, reveals new molecular targets, and facilitates personalized clinical decision-making.


Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts

Liao, Fangshuo, Kyrillidis, Anastasios

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) architectures have emerged as a cornerstone of modern AI systems. In particular, MoEs route inputs dynamically to specialized experts whose outputs are aggregated through weighted summation. Despite their widespread application, theoretical understanding of MoE training dynamics remains limited to either separate expert-router optimization or only top-1 routing scenarios with carefully constructed datasets. This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts in a student-teacher framework. We prove that, with moderate over-parameterization, the student network undergoes a feature learning phase, where the router's learning process is ``guided'' by the experts, that recovers the teacher's parameters. Moreover, we show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality. To our knowledge, our analysis is the first to bring novel insights in understanding the optimization landscape of the MoE architecture.


QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning

AL-Smadi, Mohammad

arXiv.org Artificial Intelligence

This paper presents our approach and results for SubTask 1: Islamic Inheritance Reasoning at QIAS 2025, a shared task focused on evaluating Large Language Models (LLMs) in understanding and reasoning within Islamic inheritance knowledge. We fine-tuned the Fanar-1-9B causal language model using Low-Rank Adaptation (LoRA) and integrated it into a Retrieval-Augmented Generation (RAG) pipeline. Our system addresses the complexities of Islamic inheritance law, including comprehending inheritance scenarios, identifying eligible heirs, applying fixed-share rules, and performing precise calculations. Our system achieved an accuracy of 0.858 in the final test, outperforming other competitive models such as, GPT 4.5, LLaMA, Fanar, Mistral and ALLaM evaluated with zero-shot prompting. Our results demonstrate that QU-NLP achieves near state-of-the-art accuracy (85.8%), excelling especially on advanced reasoning (97.6%) where it outperforms Gemini 2.5 and OpenAI's o3. This highlights that domain-specific fine-tuning combined with retrieval grounding enables mid-scale Arabic LLMs to surpass frontier models in Islamic inheritance reasoning.


The First Differentiable Transfer-Based Algorithm for Discrete MicroLED Repair

Lue, Ning-Yuan

arXiv.org Artificial Intelligence

Laser-enabled selective transfer, a key process in high-throughput microLED fabrication, requires computational models that can plan shift sequences to minimize motion of XY stages and adapt to varying optimization objectives across the substrate. We propose the first repair algorithm based on a differentiable transfer module designed to model discrete shifts of transfer platforms, while remaining trainable via gradient-based optimization. Compared to local proximity searching algorithms, our approach achieves superior repair performance and enables more flexible objective designs, such as minimizing the number of steps. Unlike reinforcement learning (RL)-based approaches, our method eliminates the need for handcrafted feature extractors and trains significantly faster, allowing scalability to large arrays. Experiments show a 50% reduction in transfer steps and sub-2-minute planning time on 2000x2000 arrays. This method provides a practical and adaptable solution for accelerating microLED repair in AR/VR and next-generation display fabrication.


Context-Enhanced Contrastive Search for Improved LLM Text Generation

Sen, Jaydip, Pandey, Rohit, Waghela, Hetvi

arXiv.org Artificial Intelligence

--Recently, Large Language Models (LLMs) have demonstrated remarkable advancements in Natural Language Processing (NLP). However, generating high-quality text that balances coherence, diversity, and relevance remain s challenging. Traditional decoding methods, such as bean search and top-k sampling, often struggle with either repe titive or incoherent outputs, particularly in tasks that require long-form text generation. To address these limitations, the paper proposes a novel enhancement of the well-known Contrastive S earch algorithm, Context-Enhanced Contrastive Search (CEC S) with contextual calibration. The proposed scheme introduces several novelties including dynamic contextual importance w eighting, multi-level Contrastive Search, and adaptive temper ature control, to optimize the balance between fluency, creativity, and precision. The performance of CECS is evaluated usi ng several standard metrics such as BLEU, ROUGE, and semantic similarity. Experimental results demonstrate signif icant improvements in both coherence and relevance of the generated texts by CECS outperforming the existing Contrastive Search techniques. The proposed algorithm has several pote ntial applications in the real world including legal document drafting, customer service chatbots, and content marketing. In recent years, Large Language Models (LLMs) have transformed the field of Natural Language Processing (NLP), delivering cutting-edge performance across numerous tasks, including text generation, summarization, machine translation, and question answering. Models such as OpenAI's GPT-3 [1], Google's BERT [2], and more recently PaLM [3], have greatly enhanced the capabilities of machines in understanding and generating human language. By leveraging deep neural network architectures and training on extensive datasets, LLMs have made significant strides in pro ducing fluent and coherent text that closely resembles hum an communication. Generating text from an LLM involves more than simp ly predicting the next word in a sequence according to its probability distribution. This step, known as decod ing, plays a critical role in shaping the final output. Various decoding strategies have been proposed in the literature ranging from deterministic methods such as beam search, to stoch astic methods like top-k and nucleus sampling. While the deterministic methods choose the highest probability token at each step, their stochastic counterparts introduce randomness to improve diversity in the generated output.


Heterogeneous Calibration: A post-hoc model-agnostic framework for improved generalization

Durfee, David, Gupta, Aman, Basu, Kinjal

arXiv.org Machine Learning

We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks. We consider overconfident models, whose performance is significantly better on training vs test data and give intuition onto why they might under-utilize moderately effective simple patterns in the data. We refer to these simple patterns as heterogeneous partitions of the feature space and show theoretically that perfectly calibrating each partition separately optimizes AUC. This gives a general paradigm of heterogeneous calibration as a post-hoc procedure by which heterogeneous partitions of the feature space are identified through tree-based algorithms and post-hoc calibration techniques are applied to each partition to improve AUC. While the theoretical optimality of this framework holds for any model, we focus on deep neural networks (DNNs) and test the simplest instantiation of this paradigm on a variety of open-source datasets. Experiments demonstrate the effectiveness of this framework and the future potential for applying higher-performing partitioning schemes along with more effective calibration techniques.


Autonomous Control of a Line Follower Robot Using a Q-Learning Controller

Saadatmand, Sepehr, Azizi, Sima, Kavousi, Mohammadamir, Wunsch, Donald

arXiv.org Machine Learning

In this paper, a MIMO simulated annealing SA based Q learning method is proposed to control a line follower robot. The conventional controller for these types of robots is the proportional P controller. Considering the unknown mechanical characteristics of the robot and uncertainties such as friction and slippery surfaces, system modeling and controller designing can be extremely challenging. The mathematical modeling for the robot is presented in this paper, and a simulator is designed based on this model. The basic Q learning methods are based pure exploitation and the epsilon-greedy methods, which help exploration, can harm the controller performance after learning completion by exploring nonoptimal actions. The simulated annealing based Q learning method tackles this drawback by decreasing the exploration rate when the learning increases. The simulation and experimental results are provided to evaluate the effectiveness of the proposed controller.


Linking Search Space Structure, Run-Time Dynamics, and Problem Difficulty: A Step Toward Demystifying Tabu Search

Watson, J. P., Whitley, L. D., Howe, A. E.

Journal of Artificial Intelligence Research

Tabu search is one of the most effective heuristics for locating high-quality solutions to a diverse array of NP-hard combinatorial optimization problems. Despite the widespread success of tabu search, researchers have a poor understanding of many key theoretical aspects of this algorithm, including models of the high-level run-time dynamics and identification of those search space features that influence problem difficulty. We consider these questions in the context of the job-shop scheduling problem (JSP), a domain where tabu search algorithms have been shown to be remarkably effective. Previously, we demonstrated that the mean distance between random local optima and the nearest optimal solution is highly correlated with problem difficulty for a well-known tabu search algorithm for the JSP introduced by Taillard. In this paper, we discuss various shortcomings of this measure and develop a new model of problem difficulty that corrects these deficiencies. We show that Taillard's algorithm can be modeled with high fidelity as a simple variant of a straightforward random walk. The random walk model accounts for nearly all of the variability in the cost required to locate both optimal and sub-optimal solutions to random JSPs, and provides an explanation for differences in the difficulty of random versus structured JSPs. Finally, we discuss and empirically substantiate two novel predictions regarding tabu search algorithm behavior. First, the method for constructing the initial solution is highly unlikely to impact the performance of tabu search. Second, tabu tenure should be selected to be as small as possible while simultaneously avoiding search stagnation; values larger than necessary lead to significant degradations in performance.